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ABSTRACT 



An apparatus and method for implementing object trajectory 
segmentation for an image sequence. Specifically, block- 
based motion vectors for a pair of adjacent frames are used 
to derive optical flow, e.g., affine, motion parameters. The 
object trajectory segmenter applies the optical flow motion 
parameters to form a new prediction or method for predict- 
ing the positions of all the points on an object over time 
within an interval. The new prediction is then applied and 
the result is compared with an error metric. The results from 
such comparison with the error metric will dictate the proper 
intervals (temporal boundaries) of the image sequence at 
which the motion parameters are valid for various key 
objects. 

20 Claims, 7 Drawing Sheets 
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APPARATUS AND METHOD FOR enabling the ability to support a wider range of unexpected 

DESCRIBING THE MOTION PARAMETERS queries. For example, this can be very important in appli- 

OF AN OBJECT IN AN IMAGE SEQUENCE cation areas such as security and surveillance, where it is not 

always possible to anticipate the queries. 

This application claims the benefit of U.S. Provisional 5 Therefore, there is a need in the art for an apparatus and 

Application No. 60/117,649 filed on Jan. 28, 1999, which is method for extracting and describing motion information in 

herein incorporated by reference. an image sequence, thereby improving image processing 

The invention relates to image processing for describing functions such as content -based indexing and retrieval, and 

the motion of object(s) in image sequences, e.g., video. various encoding functions. 

More particularly, the invention relates to an efficient frame- 10 

work for object trajectory segmentation, which in turn, can SUMMARY OF THE INVENTION 

be employed to improve image processing functions, such as n . J . . . 

context-based indexing and retrieval of image sequences ° De embo f d iment of the present invention is an apparatus 

with emphasis on motion description. ^ method for implementing object trajectory segmentation 

15 for an image sequence, thereby improving or offering other 

BACKGROUND OF THE DISCLOSURE ima S e processing functions such as context-based indexing 

of the input image sequence by using motion-based infor- 

With the explosion of available multimedia content, e.g., mation. More specifically, block-based motion vectors are 

audiovisual content, the need for organization and manage- usec j t0 <j er i ve opt j cal fl ow motion parameters, e.g., affine 

ment of this ever growing and complex information ^ mot i on parameters. These optical flow motion parameters 

becomes important. Specifically, as libraries of multimedia are employed to develop a prediction that is used to effect 

content continue to grow, it becomes unwieldy in indexing 0 bject trajectory segmentation for an image sequence, 

this highly complex information to facilitate efficient SpecificaUy, optical flow (e.g., affine) object motion seg- 

retrieval at a later time. mentation is initially performed for a pair of adjacent 

By standardizing a minimum set of descriptors that 25 frames. Namely, optical flow motion parameters between 

describe multimedia content, content present in a wide adjacent frames that describe the position of each point on 

variety of databases can be located, thereby making the a region at eacb ^ mstant are made avai i a ble to the 

search and retrieval more efficient and powerful. Interna- pre sent object trajectory segmenter. The present invention is 

tional standards such as Moving Picture Experts Group nol limited by thc mcthod or model that is cmp i oye d to 

(MPEG) have embarked on standardizing such an interface 30 provide the initial optical flow motion parameters between 

that can be used by indexing engines, search engines, and adjacent frames. 

filtering agents. This new member of the MPEG standards is ™ u* . . • . „ , a 

, 6 6 .. ,. . . _, . . _c , - The object trajectory segmenter applies the optical flow 

named multimedia content description interface and has rT ^ t - n ' m „ t J„ tn na „ T nr *,u~a t 

, j , ( ( wp Pr rj„ r motion parameters to form a new prediction or method for 

Deen code-named Mrbiw . predicting the positions of all the points on an object over 

For example, typical content description of a video 35 time within an interval. For example, the optical flow motion 

sequence can be obtained by dividing the sequence into parameters are code fitted to form the new prediction. The 

'•shots". A "shot" can be defined as a sequence of frames in new prediction & lhcn app lied and the result is compared 

a video clip that depicts an event and is preceded and with an error metric. For example, the error metric measures 

followed by an abrupt scene change or a special effect scene the sum of deviatioDS in distancc at cach point on the rcgion 

change such as a blend, dissolve, wipe or fade. Detection of 40 at each time instant based on the new prediction compared 

shot boundanes enables event-wise random access into a to the origi[ial pre dictions. The results from such comparison 

video clip and thus constitutes the first step towards content with the error metric wil] dictate the proper intervaIs 

search and selective browsing. Once a shot is detected, (temporal boundaries) of the image sequence at which the 

representative frames called "key frames" are extracted to motion p ara meters are valid for various key objects. In other 

capture the evolution of the event, e.g., key frames can be 45 words, it is important to detect what motion segments or 

identified to represent an explosion scene, an action chase temporal boundaries are for a key object. In doing so, the 

scene, a romantic scene and so on. This simplifies the present object trajectory segmenter obtains two sets of 

complex problem of processing many video frames of an important information: the motion parameter values that 

image sequence to just having to process only a few key accurately describe the object's motion and for which 

frames. The existing body of knowledge in low-level 50 frames the parameters are valid, 

abstraction of scene content such as color, shape, and texture XT , 4 . , a , ~ x 

from still images can then be applied to extract the rneta-data Namely the optical flow (e.g., affine) motion parameters 

for the key frames generated tor each identified key object for each adjacent 

u;L , *L ■ ' ■ 1 1 ■ . of frames are processed over an interval of the image 

While offenng a simple sohiUon to extract meta-data, the ace , Q effec , object , raj segmentation. Namely, 

above descr.pt.OD has no motion-related I information^ 55 mo tion trajectory such as direc.ion, velocity and accelera- 

Motion mformal.on can cons.derably expand the scope of (ion can ^ Muced fof each k objec , ovef SQme frame 

quer.es that can be made about content (e.g., quer.es can iuluv ^ lnereb providing an umfb€T , of motion 

have verbs in addition to nouns ). Namely, it is advan- information that can be exploited by query, 
tageous to have additional conditions on known information 

based on color, shape, and texture descriptors, be correlated so BRIEF DESCRIPTION OF THE DRAWINGS 
to motion information to convey a more intelligent descrip- 
tion about the dynamics of the scene that can be used by a Tte teachings of the present invention can be readily 
search engine. Instead of analyzing a scene from a single understood by considering the following detailed descrip- 
perspective and storing only the corresponding meta-data, it li° D m conjunction with the accompanying drawings, in 
is advantageous to capture relative object motion informa- 65 which: 

tion as a descriptor that will ultimately support fast analysis FIG. 1 depicts a block diagram of a content server of the 

of scenes on the fly from different perspectives, thereby present invention; 
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FIG. 2 depicts a block diagram of a context-based indexer compression processes outside of the content server, where 

of the present invention; the input stream is already in a compressed format. In such 

FIG. 3 depicts a flowchart of a method for implementing implementation, the encoders 110 can be omitted, 

context-based indexing of an input image sequence by using The content -based indexer 150 is employed to analyze the 

motion-based information; 5 input information and to provide an efficient index to the 

FIG. 4 depicts a flowchart of a method for implementing lar g c quantity and often complex multimedia content that 

optical flow (e.g., affine) object motion segmentation; arc stored on the storage device(s) 130. The content-based 

FIG. 5 depicts a flowchart of a method for implementing mdex ? r 150 of P^nt information is tasked to provide 

optical flow (eg., affine) trajectory segmentation; in an , hexing method and associated data structures that will 

n „ „ .„ , „ . . , „ 10 allow an eflicient method to categorize and then to allow 

FIG. 6 illustrates code fitting of optical flow motion rctrieval of fcx multimcdia contcnt ickl on th 

parameters to generate trajectory parameters; and m More particularlVj lhe presem coatenM3a sed indexer 

FIG. 7 illustrates a block diagram of an example as to 150 employs motion information to allow more complex 

which frames might be the temporal split points for an object queries that employ "verbs" (e.g., relative motion informa- 

that exists in a video sequence comprising of 20 frames. 15 tion of an object), instead of just "nouns" (e.g., the color of 

To facilitate understanding, identical reference numerals an object), 

have been used, where possible, to designate identical For example, a query for an image sequence containing a 

elements that are common to the figures. blue background, e.g., a blue sky, may generate a large 

DETAILED DESCRIPTION 20 ° f f ft®^ ^ ^ T f U ™ g the cffeC " 

^ l tiveness of the query function. In contrast, if the query can 

FIG. 1 depicts a block diagram of a content server 100 of be modified for searching an image sequence containing a 

the present invention. In one embodiment, the content server blue background with an object moving in the foreground at 

100 is implemented using a general purpose computer. Thus, a high velocity to the left, then the response to the query may 

illustrative content server 100 comprises a processor (CPU) produce a highly focused set of positive responses, e.g., an 

140, a memory 120, e.g., random access memory (RAM), a 25 image sequence having an aircraft moving rapidly across a 

context-based indexer 150, an optional encoders) U0 and blue sky background. 

various input/output devices 130, (e.g., a keyboard, a mouse, The content-based indexer 150 comprises an object 

an audio recorder, a camera, a camcorder, a video monitor, motion segmenter 160 and an object trajectory segmenter 

any number of imaging devices or storage devices, including 3Q 170. In brief, the object motion segmenter 160 is employed 

but not limited to, a tape drive, a floppy drive, a hard disk to broadly determine the relative motion of objects within 

drive or a compact disk drive). each frame, wherein the object trajectory segmenter 170 is 

It should be understood that the encoders) 110 and the employed to broadly determine the trajectory of the objects 
content-based indexer 150 can be implemented jointly or within a number of frames within an image sequence, 
separately. Namely, the encoders) 110 and the content- 35 FIG. 2 depicts a block diagram of a context -based indexer 
based indexer 150 can be physical devices that are coupled 150 of the present invention comprising an object motion 
to the CPU 140 through a communication channel. segmenter 160 and an object trajectory segmenter 170. The 
Alternatively, the encoders) 110 and the content-based object motion segmenter 160 comprises a block-based 
indexer 150 can be represented by one or more software motion estimator 210, an optical flow (e.g., affine) seg- 
applications (or even a combination of software and 40 menter 212, a key object tracker 214, a key object splitter 
hardware, e.g., using application specific integrated circuits 216, and an optical flow (e.g., affine) segmenter 218. The 
(ASIC)), where the software is loaded from a storage object trajectory segmenter 170 comprises a key object 
medium, (e.g., a magnetic or optical drive or diskette) and trajectory segmenter 220 and a sub-object trajectory seg- 
operated by the CPU in the memory 120 of the computer. As menter 222. The broad functions performed by these mod- 
such, the encoders) 110 and the content-based indexer 150 45 ules are briefly described with reference to FIG. 2. Detailed 
(including associated data structures) of the present inven- descriptions of these functions are provided below with 
tion can be stored on a computer readable medium, e.g., reference to the flowcharts and other diagrams of FIGS. 3-6. 
RAM memory, magnetic or optical drive or diskette and the i n operation, an image sequence is received into block- 
m ^ e - based motion estimator 210, where motion information, e.g., 

In operation, various multimedia information are received 50 block-based motion vectors, are computed from the image 
on path 105 and stored within a storage device 130 of the sequence for each frame. However, if the content server 100 
content server 100. The multimedia information may has an external encoder 110 or the input image sequence 
include, but is not limited to, various image sequences such already contains motion information, i.e., where the motion 
as complete movies, movie clips or shots, advertising clips, vectors are encoded with the image sequence, then block- 
music videos and the like. The image sequences may or may 55 based motion estimator 210 can be omitted. Namely, the 
not include audio streams or data streams, e.g., closed block based motion information can be extracted from the 
captioning and the like. compressed bitstream itself or is provided by other modules 

Due to the explosion of available multimedia content and of the content server 100, thereby relieving the object 

their large size, the input information on path 105 may motion segmenter 160 from having to compute the motion 

undergo a compression process that is illustrated as one or 60 vectors. 

more optional encoders 110. The encoders 110 may com- In turn, the optical flow (e.g., affine) segmenter 212 

prise video and audio encoders, e.g., MPEG-like encoders applies the motion vector information to generate "affine 

that are designed to reduce spatial and temporal redundancy. motion parameters". Although the present invention is 

However, any number of compression schemes can be described below using the affine motion model, it should be 

employed and the present invention is not so limited to any 65 understood that other optical flow models can be employed 

particular scheme. The encoders 110 are optional since the as well. The affine motion model is disclosed by J. Niewe- 

input information may have already undergone various glowski et a 1. in "A Novel Video Coding Scheme Based On 
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Temporal Prediction Using Digital Image Warping", IEEE illustrates two affine segmenters 212 and 218, it should be 
Trans. Consumer Electronics, Mai. 39, 3, pp. 141-150, understood that a single affine segmenter can be imple- 
Augusi, 1993, which is incorporated herein by reference. mented to perform both levels of affine processing (i.e., key 
The affine motion model constructs a prediction image or object and sub-object processing). 

frame from a previous image by applying a geometric 5 In turn, the motion information from key object tracker 
transformation known as "image warping 1 '. The transform 214 is forwarded to a key object trajectory segmenter 220. 
specifies a spatial relationship between each point in the Although it is possible to maintain and track the motion 
previous and prediction images. information, e.g., the affine motion parameters, for each key 

Generally, motion compensation using block matching object, it has been found that storing such motion informa- 
provides a good overall performance for translational 10 tion requires a substantial storage requirement. Thus, the 
motion. However, the block-matching motion estimation is motion information for each key object is forwarded to the 
a poor performer when motion contains rotational or scaling key object trajectory segmenter 220, where motion trajec- 
components (e.g., zooming or rotating an image). tory information and intervals (frame intervals) are gener- 

In contrast, the affine motion model (affine ated for each key object. Namely, the motion information is 
transformation) is defined by six parameters (a 2 to a e ) and is 15 summarized into "key object trajectory information", e.g., 
expressed as: direction, velocity, acceleration and the like within some 

defined intervals (over a number of frames). This allows the 
(1 ) motion information to be captured and stored in a format that 
allows for efficient motion-based indexing (or other image 
20 processing) of multimedia content. Optionally, motion infor- 
mation for each sub -object is forwarded to the sub-object 
trajectory segmenter 222, where motion trajectory informa- 
where (x, y) are pixel coordinates in the previous frame and tion and intervals (frame intervals) are generated for each 
(u, v) are the coordinates of a given pixel in the prediction sub-object. 

frame. A detailed discussion on determining the six param- 25 FIG. 3 depicts a flowchart of a method 300 for imple- 
eters is presented in the J. Nieweglowski et al. reference. The menting affine segmentation, thereby improving or offering 
affine relationship is characterized by the six parameters. other image processing functions such as context-based 
Thus, the affine motion model is generally more effective in indexing of an input image sequence by using motion-based 
predicting motions such as translation, scaling, and rotation information. More specifically, method 300 starts in step 305 
which are often observed not only in natural sequences, but 30 and proceeds to step 310 where affine object motion seg- 
also in synthetic scenes using digital effects. mentation is performed. Namely, key objects are identified 

Namely, the affine segmenter 212 is tasked with the within some intervals of the image sequence (also known as 
identification, segmentation, and generation of affine param- a "shot" having a number of frames of the input image 
eters for the "key objects" for each frame of the image sequence) and their motion information is extracted and 
sequence. Key objects can be viewed as objects that are 35 tracked over those intervals. In step 310, affine motion 
sufficiently significant that tracking of their motion is impor- parameters are generated for each identified key object, 
tant for the purpose of indexing or other image processing In step 320, the affine motion parameters generated for 
functions. Typically, key objects are identified in part based each identified key object for each adjacent pair of frames 
on their size, i.e., large objects are typically key objects, are processed over an interval of the image sequence to 
whereas small objects are not key objects. Thus, a moving 40 effect object trajectory segmentation. Namely, motion tra- 
vehicle is typically a key object whereas a small moving jectory such as direction, velocity and acceleration can be 
insect in the background is not a key object. Nevertheless, deduced for each key object over some frame interval, 
the requirements for qualifying key objects are application thereby providing an another aspect of motion information 
specific, and are defined by the user of the present invention. that can be exploited by query. Method 300 then ends in step 
Once key objects are defined, the motions of these key 45 325. 

objects are then tracked by key object tracker 214. FIG. 4 depicts a flowchart of a method 310 for imple- 

Optionally, if the motion information of components of menting afSne object motion segmentation. Namely, method 
each key object is also important for the purpose of indexing 310 is a more detailed description of step 310 of FIG. 3. 
or other image processing functions, additional processing is Method 310 starts in step 405 and proceeds to step 407, 
performed by the key object splitter 216. Specifically, a key 50 where method 310 generates affine motion parameters from 
object can be segmented into sub-objects and the motion block-based motion information. Namely, a random number 
information for these sub-objects can be tracked individu- of blocks are selected where their block-based motion 
ally. For example, a key object of a human being can be vectors are employed to derive affine motion parameters as 
segmented into six sub-objects comprising a head, a body discussed below. 

and four limbs. Thus, a query can now be crafted to search 55 In step 410, method 310 attempts to track one or more 
for "actions" that are relative to sub-objects within a key identified key objects from a previous frame, i.e., obtain the 
object, e.g., searching for an image sequence where a limb label of a key object from a previous frame. Namely, once 
of a person is raised above the head of the person and so on. key objects have been identified for a pair of adjacent frames 
Although some key objects can be readily split into as discussed in step 420 below, it may not be necessary to 
well-defined sub-objects, other key objects may require 60 again apply the same detection step for the next frame in the 
further processing to identify the boundaries of sub-objects. image sequence. Namely, for a new frame, block based 
Thus, the key objects information can be forwarded from the motion vectors can be employed to rapidly look backward to 
key object tracker 214 directly to an affine segmenter 218 for see whether the blocks point to a previously labeled key 
identification and segmentation of "sub-objects" for each object. If so, such blocks will retain the same labeling as in 
key objects. Thus, affine segmenter 218 is also tasked with 65 the previous frame. For example, if four (4) key objects have 
generating affine motion parameters for the sub-objects. It been determined for a pair of adjacent frames and if the next 
should be noted that although the content-based indexer frames contains five (5) connected regions, then motion 
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vectors are employed to determine whether four out of the example, derive affine motion parameters between a sub- 
five connected regions in the present frame correlate to the sampled set of frames in the chosen interval depending on 
four identified key objects. If so, then only the remaining the order of the fit, e.g., quadratic expressions require at least 
single connected region is tested in accordance with step 420 2 data sets, etc. Specifically, decompose affine motion 
to determine whether it is a key object. This method of 5 parameters into its components, namely, scale, rotation, 
tracking significantly reduces the computational overhead, shear, and translation. Assume different temporal models for 
i.e., key objects are tracked in the image sequence until the each component depending on its nature. For example, the 
objects can no longer be tracked, e.g., the object has become translation can be modeled using a polynomial, e.g., qua- 
too small or occluded. However, if no key objects can be dratic in time. The scale can be modeled to vary linearly over 
identified, step 410 is skipped, as in the case where method 10 time. The rotation can be modeled using a constant angular 
310 processes a new shot. velocity assumption. 

In step 420, method 310 identifies the key objects within In one embodiment, a model is used that can describe the 

a pair of adjacent frames of the input image sequence. In one average position of the object in each frame, {x(t),y(t)} , as 

embodiment, the key objects are identified by determining follows: 
whether a block is within the affine object flow. 15 

In step 430, method 310 may optionally merge identified ifl-O-Sfl^+v+jr. 

key objects. For example, the identified key objects may be y (t)-o.5af+vt+y (2) 
too small to be significant for image processing purposes, 

i.e., indexing in a particular application, e.g., a single bird where a^. and a^ represent acceleration, v^ and v y represent 

can be merged with other birds within the frame to form a 20 velocity, and x 0 and y 0 represent the initial position. The 

key object of a flock of birds. method begins with the initial estimate that the entire scene 

In step 440, method 310 may optionally identify sub- can be described by one set of motion parameters. In other 

objects within key objects. In contrast to step 430 above, the words, the selected interval is the entire image sequence, 

identified key object may have significant motion informa- In step 530, method 320 computes the trajectory param- 

tion associated with its components (sub-objects) for index- 25 eters for the selected interval using code fitting, i.e., using 

ing purposes in a particular application. For example, an the subsampled affine motions to obtain the coefficients for 

identified key object comprising of a human being may the chosen prediction model, represented by curve(s) 610 ol _ 

comprise sub-objects, i.e., the person's limb, where the *»6, as shown in FIG. 6. Namely, a new parametric expression 

relative motion information of the sub-objects is important or prediction is developed to predict the position of all the 

for indexing the shot. 30 points on the object over time within a selected interval. 

In step 450, method 310 queries if there are additional In step 540, method 320 computes and sums the errors for 

frames associated with the present "shot". If the query is each key object for at least one of the afEne motion param- 

negatively answered, then method 310 ends to step 455. If eter. Namely, an error metric is chosen that measures the 

the query is positively answered, then method 310 proceeds sum of deviations in distance at one or more points on the 

to step 407, where the afEne motion parameters are gener- 35 region at each time instant based on this new prediction as 

ated for the next pair of adjacent frames. Namely, afEne compared to the original positions. 

segmentation has to be performed between successive pairs In step 550, method 320 queries whether the summed 

of frames. The reason is that the afEne motion parameters are error is greater than a threshold "Tl". If the query is 

needed at each instant to model the trajectory and to also positively answered, then method 320 proceeds to step 560, 

handle new objects/occlusions. 40 where the selected interval is split or divided into two 

Method 310 then ends in step 455 when all frames have intervals at the location of maximum frame error. If the 

been processed. A detailed description of a novel object query is negatively answered, then method 320 proceeds to 

motion segmenter 160 is described in U.S. patent appfication step 570. 

entitled "Apparatus And Method For Context-Based Index- Namely, the new trajectory parameter set is calculated. If 

ing And Retrieval Of Image Sequences" with attorney 45 the average error (averaged over the time interval for which 

docket SAR 13430, which is herein incorporated by refer- the model is vaEd) exceeds a threshold Tl, then the sequence 

ence and is filed simultaneous herewith. or selected interval is split into two temporal sections at the 

FIG. 5 depicts a flowchart of a method 320 for imple- frame where there is a maximum error, 

menting optical flow (e.g., afEne) trajectory segmentation. In step 570, method 320 queries whether there is a next 

More specifically, method 320 is a more detailed description 50 interval. If the query is positively answered (as in the case 

of the steps 320 of FIG. 3 and the afEne motion model is when a split operation occurs in step 560), then method 320 

employed to describe the present invention. returns to step 520, where steps 520-560 are repeated for 

Method 320 starts in step 505 and proceeds to step 510 each new interval, i.e., two new intervals are generated for 

where affine motion parameters between adjacent frames to each split operation. If the query is negatively answered, 

describe the position of each point on a region at each time 55 then method 320 proceeds to step 580. Namely, all intervals 

instant are obtained as discussed above in FIG. 4. It should have been evaluated for spEtting. Namely, method 320 

be noted that various methods exist for determining the continues to calculate the motion parameters of each interval 

motion information for regions within a frame, e.g., various as well as the error for that time interval, and continues to 

optical flow techniques. As such, the present invention is not perform a temporal split if the average error exceeds the 

limited by a particular method or model that is employed to so threshold. The iteration stops either when the time interval 

provide the initial optical flow motion parameters between is smaller than 3 frames or when the error is below the 

adjacent frames of an interval of the image sequence. threshold. 

Namely, as to the present trajectory segmenter, it is assumed FIG. 7 illustrates a block diagram of an example as to 

that a "segmentation" or delineation between objects is which frames might be the temporal split points for an object 

previously computed and is known. 65 that exists in a video sequence comprising of 20 frames. In 

In step 520, method 320 models at least one of the affine the above examples, Sections or segments 0 and 2 are 

motion parameters for a particular or selected interval. For considered not "valid" because these sections are required to 
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be split. Where as Sections 1, 3 and 4 are valid since no 
splitting is required. Only the motion parameters of the valid 
sections are stored, and passed on to the Merge subroutine. 
It should be noted that the start and end frames of Sections 
1, 3, and 4 are contiguous. As described below, it is plausible 5 
that Sections 1 and 3 can be combined into a single section 
whose motion parameters fall within the error threshold. 

In step 580, method 320 evaluates each pair of adjacent 
intervals for potential merging. Specifically, adjacent inter- 
vals are merged, if a joint prediction model for two adjacent 10 
intervals results in a normalized error below a threshold 
"T2". This step is repeated for all intervals recursively until 
any further merging increases the normalized error in the 
merged interval above the threshold, T2. In other words, 
once the scene has been split into many time intervals, a 15 
merge operation is applied to all the "valid" time intervals. 
The merge operation looks at successive temporal segments 
or intervals, and calculates a new set of trajectory parameters 
for the two selected segments, and decides to merge the two 
segments only if the error (of the parametric motion model 20 
of the merged segments) falls below a threshold. 

Method 320 ends in step 590 when all possible merge 
operations have been performed. Thus, method 320 of the 
present invention provides an effective way of describing the 
motion parameters of an object within a sequence, and 25 
detecting the temporal boundaries for which to update the 
motion parameters, given a motion model. These resulting 
descriptors can then be exploited by image processing 
functions, e.g., applied to a video sequence for indexing or 
searching tool, to detect objects or events consisting of 30 
object interactions. 

An example illustrating the importance of the split and 
merge operations is as follows: Suppose a sequence starts 
with frames 0-20, and it has been determined that the best 
split point is at frame 5. After a split operation is performed 35 
at the specified location, two segments or intervals, frames 
0-5 and 6-20 remain. If frames 0-5 need not be split, and 
the best split point for the interval from frames 6-20 occurs 
at frame 8, then after another split operation, three segments: 
frames 0-5, 6-8, and 9-20 remain. Once all segments no 40 
longer need to be split, a successive check is performed to 
determine if adjacent segments can be merged. Thus, in the 
above examples, a check is performed to determine if the 
first two segments (frames 0-5 and frames 6-8) could be 
merged into one segment forming frames 0-8, thus if the 45 
merge occurs a total of two segments, frames 0-8 and frames 
9-20 are obtained. It should be noted that in the first pass of 
the split operation, the best split point was at frame 5 and not 
at frame 8. This discrepancy is due to the fact that real 
objects may not exactly follow a quadratic trajectory, so the 
point of maximum error may not always correspond to the 
point where the motion model parameters need to be 
updated. It should be noted that the above example and 
psuedocode below are for the quadratic model for the 
position of object only. 
Psuedocode Outline: 

Struct Section { 
StartFrame; 
End Frame; 
MotionParameters; 
Error; 

ValidSection;} 
Main(Object) { 
AvgPos=average spatial position of Object at each 65 

frame; 
Initialize NumSections; 



50 



55 



60 



Initialize SectionParameters; /* An array of type Sec- 
tion */ 

NumSections* 
Split(SectionPara meters, NumSections ,AvgPos, 

Threshold); 

NumSections= 
Merge(SectionParameters,NumSections,AvgPos, 

Threshold);} 

Split (SectionParameters, NumSections, AvgPos, 
Threshold)! 

Calculate NumFrames in current interval; 
if (NumFrames<=3) 
Calculate and store SectionParameters 
[NumSections];/* 
parameters of the current Section */ 
else{ 

Calculate and store SectionParameters 

[NumSections]; 
if (AvgError>T){ 

Find location of max error of parametric model; 

/*this indicates our split point */ 
NumSections++; 

Set StartFrame and EndFrame of 
SectionParameters[NumSections]; /*based on location of 

split point */ 

NumSections= 
Split(SectionParameters,NumSections,AvgPos, 

Threshold); 

NumSections++; 

Set StartFrame and EndFrame of 
SectionParameters[NumSections]; /*based on location of 

split point */ 

NumSections^ 
Split(SectionParameters,NumSections,AvgPos, 

Threshold);}}} 
Merge(SectionParameters, NumSections, AvgPos, 

Threshold) { 

For (CurrentSection = 0; 

CurrentSection<NumSections-l; 
CurrentSection++) 
while (CurrentSection and NextSection can be merged 

into one){ 

/""based on calculating MergedError<Threshold */ 
Calculate and store SectionParameters 

[CurrentSection]; 
Shift SectionParameters[NextSection+l] through 

SectionParameters[NumSections] down 1 index; 
NumSections — ; } } 

The above method illustrates a general approach that 
allows for modeling of the object's motion via any para- 
metric model. As long as some kind of error metric can be 
computed to determine whether or not the object's true 
motion is well described by the parametric model. The same 
kind of split and merge technique can be used to find the 
boundary points of when the model's parameters need to be 
updated. Although the present invention is described using 
an affine motion model that describes an object's 
translational, rotational, shear, and zoom characteristics, it is 
not so limited. 

Below are the standard 2-dimensional affine motion 
equations, where v x and v y represent the velocity, C and F 
represent the translational motion components, and A, B, C, 
and F describe the rotational, shear, and zoom components: 

v y mDx+Ey+F (3) 

It should be noted that the above-discussed thresholds are 
application specific as each application can have different 
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levels of tolerance to errors in trajectory. The threshold that 7. The method of claim 3, wherein said at least one optical 

can be used for the above quadratic case is as follows: If flow motion parameter is at least one of a plurality of affine 

(sum of deviation in distance over all points in the interval)/ motion parameters. 

(Number of points in the interval)>0.9, then split. Similarly, 8 ^ melhod of claim 7> wherein ^ luralil of affine 

if (sum of deviation in distance over all points in the 5 t i . * . ■ ■ , 

■ * i\//kt * c ...... ix «n motion parameters corresponds to scale, rotation, shear and 

interval)/(Number of points in the interval)<0.9, then merge. . 

Although the present object motion segmentation and ans aUon - 

object motion trajectory segmentation are described above 9 - ^ method of claim 1, wherein said modeling step b) 

for improving the indexing and retrieval of image employs a polynomial model over time. 

sequences, it should be understood that the present invention 10 10. The method of claim 9, wherein said modeling step b) 

can be employed to improve or provide other image pro- employs a linear model over time. 

cessing functions. For example, the present invention can be 11. The method of claim 9, wherein said modeling step b) 

employed in image processing functions, e.g., the synthesis employs a quadratic model over time, 

of content from object trajectory (for quick preview) given n ^ melfaod of ^ u said ^ ^ 

the initial texture. 15 ^ 

Furthermore, although the present invention is described over time 15 ex P ressed as: 

above in terms of objects, it should be understood that an -rn-osflj 2 

object is broadly defined to be a region of interest having " ^ +v+x ° 
varying sizes depending on the application. Similarly, 

although the present invention is described above in terms of 20 jKO-O-W 2 * vj+y 0 
blocks such as macroblocks, it should be understood that a 

block is broadly defined to be a block of varying sizes where a^. and a y represent acceleration, v x and v y represent 

depending on the specific application. velocity, and x 0 and y 0 represent an initial position. 

Although various embodiments which incorporate the __ , , . , . « . , t 

teachings of the present invention have been shown and 25 13 ^ melhod of claim whereiD Mld evaluatin S ^ep 

described in detaU herein, those skilled in the art can readily d ) evaluates said trajectory model parameters agamst a 

devise many other varied embodiments that still incorporate threshold to determine if said interval of said image 

these teachings. sequence is to be split. 

What is claimed is: 14 ^ me±od of claim 13 wherein said evaluating step 

1. A method tor performing object trajectory seementa- 30 ,v ,. ... . , r 7 

r . r .1 i v /t j °J applies said split operation at a location of maximum 

tion tor an image sequence having a plurality of frames, said r r r 

method comprising the steps of: error ' 

a) obtaining at least one optical flow motion parameter for 15. An apparatus for performing object trajectory seg- 
at least one pixel of an object within the image mentation for an image sequence having a plurality of 
sequence; 35 frames, said apparatus comprising: 

b) choosing a model for the trajectory of said at least one 

optical flow motion parameter as a function of lime means for obtainin g at least one optical flow motion 

over an interval of said image sequence; parameter for at least one pixel of an object within the 

c) determining trajectory model parameters from said image sequence; 

modeled optical flow motion parameter for said interval 40 means for choosing a model for the trajectory of said at 

of said image sequence; and l eas t 0 ne optical flow motion parameter as a function of 

d) evaluating said trajectory model parameters to deter- time over an interval of said image sequence; 
mine if said interval of said image sequence is to be r j * ■ ■ * • * 

split, wherein said split operation is applied by dividing . meanS f ° r de ™™g model parameters from 

said interval of said image sequence into at least two Mld modeled °P tlcal flow motlon parameter for said 

separate intervals of frames. mtcrval of said sequence; and 

2. The method of claim 1, further comprising the step of: means for evaluating said trajectory parameters to deter- 

e) repeating steps b) through d) for each newly created mm e if said interval of said image sequence is to be 
interval resulting from said split operation. 50 split, wherein said split operation is applied by dividing 

3. The method of claim 2, further comprising the step of: said interval of said image sequence into at least two 

f) evaluating said trajectory model parameters of two separate intervals of frames. 

adjacent intervals to determine if said two intervals of 16- The apparatus of claim 15, wherein said evaluation 

said image sequence are to be merged. means further evaluating said trajectory model parameters of 

4. The method of claim 3, wherein said evaluating step f) 55 two adjacent intervals to determine if said two intervals of 
evaluates said trajectory model parameters against a thresh- said image sequence are to be merged. 

old to determine if said two intervals of said image sequence 17. a computer-readable medium having stored thereon a 

are tobe merged. plurality of instructions, the plurality of instructions includ- 

5. The method of claim 3, further comprising the steps of: ing mstrU ctions which, when executed by a processor, cause 

g) indexing said object of the image sequence in accor- 60 lh e processor to perform the steps comprising of: 
dance with said trajectory model parameters and said . , . . , . , „ 

intervals. a ) 0Dtainm g at lcast one optical flow motion parameter for 

6. The method of claim 5, wherein said indexing step g) at least one P ixcl of an oh ^ x within thc ima S c 
indexes said object of the image sequence in accordance sequence; 

with said trajectory model parameters and said intervals in 65 b) choosing a model for the trajectory of said at least one 

conjunction with a spatial information associated with said optical flow motion parameter as a function of time 

object. over an interval of said image sequence; 
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c) determining trajectory model parameters from said 
modeled optical flow motion parameter for said interval 
of said image sequence; and 

d) evaluating said trajectory model parameters to deter- 
mine if said interval of said image sequence is to be 
split, wherein said split operation is applied by dividing 
said interval of said image sequence into at least two 
separate intervals of frames. 

18. The computer-readable medium of claim 17, further 
comprising the step of: 



►,701 Bl 

14 

f) evaluating said trajectory model parameters of two 
adjacent intervals to determine if said two intervals of 
said image sequence are to be merged. 

19. The computer-readable medium of claim 17, wherein 
said at least one optical flow motion parameter is at least one 
of a plurality of affine motion parameters. 

20. The computer-readable medium of claim 17, wherein 
said modeling step b) employs a polynomial model over 
time. 
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